Target search on a dynamic DNA molecule 



Thomas Schotz, 1 Richard A. Neher, 2 and Ulrich Gerland 1 ^ 

1 Arnold Somrnerfeld Center for Theoretical Physics and Center for Nano Science, 
University of Munich, Theresienstr. 37, D-80333 Miinchen, Germany 
e Max- Planck- Institute for Developmental Biology, Spemannstr. 35, 72076 Tubingen, Germany 

(Dated: September 16, 2011) 

We study a protein-DNA target search model with explicit DNA dynamics applicable to in vitro 
experiments. We show that the DNA dynamics plays a crucial role for the effectiveness of protein 
"jumps" between sites distant along the DNA contour but close in 3D space. A strongly binding 
protein that searches by ID sliding and jumping alone, explores the search space less redundantly 
when the DNA dynamics is fast on the timescale of protein jumps than in the opposite "frozen DNA" 
limit. We characterize the crossover between these limits using simulations and scaling theory. We 
also rationalize the slow exploration in the frozen limit as a subtle interplay between long jumps 
and long trapping times of the protein in "islands" within random DNA configurations in solution. 



PACS numbers: 87.15.H-, 87.14.gk, 82.37.-j 

The quantitative characteristics of proteins searching 
for their specific target sites on long DNA molecules has 
become a paradigmatic question of biological physics [T]- 
0]. The question is of considerable biological interest, 
since search processes of this type are key steps in cel- 
lular functions. For instance, in signal transduction, a 
protein belonging to the large class of transcription fac- 
tors conveys an external signal and triggers the appropri- 
ate genetic response by binding to specific target sites on 
the genomic DNA. Similarly, restriction enzymes, used 
by bacteria to fight invading viruses, search for cleav- 
age sites marked by specific DNA sequences. It is gen- 
erally assumed that the target search mechanism has 
been optimized by evolution, due to selective pressure 
for fast signaling and rapid responses in competitive en- 
vironments. From the physics perspective, the protein- 
DNA target search is a complex but tractable stochastic 
process that combines basic aspects of Brownian motion, 
polymer physics, and information theory [5l415j. Experi- 
mentally, the search process can be probed on the single- 
molecule level in vitro pjjj, and even in vivo |17j . 

Early in vitro experiments [2] indicated that the asso- 
ciation rate of lac repressor to its target site embedded 
in short pieces of DNA is faster than the diffusion limit, 
k a = AirDb, for a direct binding reaction with diffusion 
constant D and reaction radius b. Inspired by Adam 
and Delbriick's idea that reduction of dimensionality is a 
generic way to enhance reaction rates |18| . Richter and 
Eigen ;3J interpreted these experiments with a two-step 
mechanism where 3D diffusion and non-specific associa- 
tion to DNA is followed by ID diffusive sliding into the 
target site. In a seminal series of papers [3], Berg, Winter, 
and von Hippel then established much of what is known 
today about the protein-DNA search kinetics. They ex- 
perimentally varied the non-specific binding strength via 
the ion concentration, identified an optimum where the 
search is fastest, and explained the behavior in a theo- 
retical analysis. 



The existence of an optimum reflects a generic tradeoff 
in search processes for hidden targets [15]: A stochastic 
local search is exhaustive but redundant; interrupting the 
search by phases of rapid movement to new territory is 
a time investment that pays off by reducing the redun- 
dancy. The optimal fraction of time spent in each of the 
two "modes" depends on the statistical characteristics 
of the search mechanism. The simplest scenario, where 
proteins slide diffusively along the DNA, dissociate spon- 
taneously, and randomly reattach at uncorrelated posi- 
tions, leads to an optimum where, on average, only half 
of the proteins are bound somewhere on the DNA and 
the other half is in solution [3]. Physically, this is best 
understood ;9. in terms of the typical dwell times of a 
protein in the sliding mode, r s , and in the dissociated 
state, Td- The latter should be regarded as a fixed pa- 
rameter, set by cell size and composition, whereas r s can 
be adapted by molecular evolution of the DNA-binding 
domain of the protein (to adjust the non-specific affinity). 
If t s < Td, the protein spends too little time searching, 
while if t s > Td, the search is too redundant; the search 
is fastest when they are equal. 

However, in bacterial cells, well studied transcription 
factors are bound to DNA > 90% of the time [5 . This 
fact has drawn attention to the 'intersegment transfer' 
[H HH H31 HI of proteins within the same DNA molecule, 
between sites close in space but distant along the contour. 
Potentially, this process can destroy the redundancy of 




FIG. 1: Illustration of the target search by sliding (ID diffu- 
sion) and jumping on a dynamic polymer. 
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the ID search without the price of interrupting it by long 
excursions into the solvent. The term was introduced for 
proteins with two DNA-binding domains and refers to a 
process during which the protein never detaches from the 
DNA; a similar transfer but with a brief unbound period 
is referred to as 'hopping' [JJ. In both cases, the essen- 
tial difference to the uncorrelated random reattachment 
discussed above is the correlated nature of the process: 
Transfer does not occur with equal probability to every 
site on the DNA, but to "linked" sites. Here, we simply 
refer to both processes as 'jumping'. 

The interplay of protein sliding and jumping leads to 
intricate search dynamics. An analytical study con- 
sidered the effect of jumps using the fractional Fokker- 
Planck equation [20] . which assumes that consecutive 
jumps are uncorrelated, i.e. that the DNA configuration 
randomizes between two jumps. In contrast, a numerical 
study of sliding and jumping on a random but frozen con- 
tour |21j showed that correlations between jumps drasti- 
cally alter the dynamics, leading to "paradoxical" quasi- 
diffusive behavior instead of super-diffusion along the 
contour. Specifically, the distribution of the protein on 
the DNA exhibits characteristic heavy tails even though 
its width increases only diffusively. These findings, and 
the fact that the dynamics of real DNA is neither frozen 
nor annealed over the relevant range of /zs to s timescales 
[3], call for an analysis of target search on a dynamic 
DNA, see Fig. [T] Here, we characterize the crossover 
between the frozen and the annealed regime using simu- 
lations and scaling theory. We then study the mechanism 
whereby correlated jumps create the paradoxical behav- 
ior in the frozen limit. 

Model. — To make the problem tractable, we describe 
the DNA contour as a path of L segments on a simple 
cubic lattice, and generate its conformational dynamics 
with a kinetic Monte Carlo scheme based on a generalized 
Verdier-Stockmayer move set [22] with moves for kinks, 
chain ends, and crankshafts, see Fig. SI. These moves, 
carried out at rate fcrj, implement Rouse dynamics on 
a lattice for an ideal chain (no self-avoidance). We de- 
scribe a protein as a point particle on the lattice, which 
diffuses along the DNA contour at rate k p . If another 
DNA segment passes through the same point, the pro- 
tein can randomly jump to it (at the same rate k p , for 
simplicity). We focus on the limit of strong DNA bind- 
ing without explicit 3D diffusion of the protein (although 
jumps may involve 3D diffusion, as discussed above). As 
initial condition, we use a random DNA configuration 
with the protein on the central segment. Clearly, the 
configuration of the DNA inside a bacterial cell is not 
random, due to genome packaging and confinement, but 
a random configuration is an interesting starting point 
for exploration of the physical principles, and mimics the 
situation of in vitro experiments. 

Transport. — To characterize how a protein explores 
the search space, we study the time evolution of its 
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FIG. 2: Time evolution of the width A of the protein dis- 
tribution P(s,t) for different kinetic ratios k = ko/k p . A 
crossover from super-diffusive to quasi-diffusive dynamics oc- 
curs for finite k. 



probability distribution P(s,t) along the DNA contour 
(0 < s < L). Fig. [2] plots its width K(t), defined as 
the interquartile range A = -f _1 (§) — of the cu- 

mulative distribution I(y) — J* ds P(s,t), for different 
kinetic ratios k = ku/k p . We obtain P(s, t) by averaging 
over > 10 3 simulations, with L = 5000 and different ini- 
tial DNA configurations. In the 'quenched limit' k — > 
(squares), the protein moves on a frozen contour, and the 
width grows quasi-diffusively with time, A ~ i 1 / 2 , despite 
the long-range jumps along the contour and a heavy tail 
of the distribution P(s,t) at fixed t [21]. In the opposite 
'annealed limit' k — > oo (crosses, obtained by randomly 
drawing a new DNA configuration after each jump), the 
distribution initially spreads super-diffusively along the 
contour, A ~ t a (here: a 1.7). The width saturates 
at A —> L/2 as the protein explores the entire DNA. In 
the regime of intermediate k, which is relevant in most 
experimental situations, A(t) displays a crossover from 
super- to quasi-diffusive scaling. The curves for different 
k show that the crossover timescale r c increases with k. 

For large k, the connectivity of the DNA meshwork 
on which the protein moves changes rapidly, such that 
successive jumps are uncorrelated (they occur on differ- 
ent link sets). One can then describe the dynamics by 
the average jump probability P(s, s') from site s to site 
s', which is physically determined by the DNA looping 
probability. For an ideal chain, this probability decays as 
\s — s'|~ 3 ' 2 for large loops, before it is cut off by the finite 
DNA length. When successive jump lengths are indepen- 
dently drawn from this distribution, the typical distance 
A from the initial position is dominated by the largest 
jump, which grows with the number of jumps (~ t) as 
A(i) ~ t 2 [23 . Indeed, our numerical exponent a ap- 
proaches 2 at large L (data not shown). However, what 
does the transport A(t) imply for the target search pro- 
cess? 

Search time. — Without a guiding "funnel", no search 
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process can be faster than linear exploration. A faster 
than linear A(t) leads to "sloppy search" [11] where 
patches dispersed over the entire contour are explored 
before the target is located. This is precisely what is re- 
quired to break the redundancy of ID diffusion, suggest- 
ing that jumping is an effective mechanism that could 
replace 3D diffusion in the annealed regime. On the 
other hand, we expect that jumping is ineffective in the 
frozen limit, as it leads only to quasi-diffusive spread- 
ing along the DNA. To study the target search on a dy- 
namic DNA explicitly, we performed simulations with a 
target site placed at different distances from the initial 
protein position. Fig. S2A shows that the search indeed 
takes increasingly longer as the DNA dynamics is slowed. 
Fig. S2B shows that the strong dependence of the search 
time on the initial distance to the target (at k — 0) be- 
comes weaker as k is increased, see caption for details. 

It will require single-molecule experiments of the type 
of [17] (but under controlled in vitro conditions) to find 
out which regime of k values is biologically most rele- 
vant. However, a rough estimate, based on the experi- 
mental relaxation time of t = 30 s for the contour of a 
L = 43 /im DNA fragment and the experimental scaling 
law r ~ L 165 [55], indicates that on the ms-timescale 
of protein jumps [3], only short DNA segments will be 
equilibrated. We therefore expect that neither the an- 
nealed nor the frozen limit, but the crossover regime will 
be most relevant experimentally. 

Scaling of the crossover. — To understand the physics 
of the crossover regime within our model, we apply a 
scaling argument to the interplay of DNA and protein 
dynamics: A DNA segment of length £ equilibrates on a 
time scale r ~ £ 2 (Rouse dynamics). Within a time r 
after a protein docks onto the DNA and starts exploring, 
it typically visits a DNA stretch A(r). During this time, 
a DNA segment of size I ~ {k^r) 1 / 2 equilibrates. Su- 
pcrdiffusivc protein transport results as long as A(t) < £, 
however the fast growing A(t) ~ (k p t) a quickly outruns 
the "equilibration blob", and the passing point marks 
the crossover to the quasi-diffusive regime. With a = 2, 
this crossover timescale t c then depends on the kinetic 
ratio k as k p t c ~ fc 1 / 3 . Our simulations cannot explore 
a wide range of k values due to computational cost and 
do not allow a precise determination of this scaling (how- 
ever, the scaling exponent that best describes our limited 
data deviates only by 0.08 from the expected value 1/3, 
see Fig. S3). The small numerical value of the exponent 
leads to a broad crossover as a function of k, again sug- 
gesting that neither the annealed nor the frozen limit is 
experimentally attainable. 

Quenched limit. — To obtain a better understanding 
of the mechanism responsible for the slow down of the 
search, we focus on the quenched limit. When first re- 
ported [21], the quasi-diffusive transport was attributed 
to correlation effects. However, what is the nature of 
these correlations and how do they render the long- 




FIG. 3: The link diagram for a typical DNA conformation (A) 
is separable into islands (green). Random reshuffling of the 
same finks destroys the isiands (B). A toy modef for transport 
on the island structure ieads to the dynamical phase diagram 
(C), which expiains the quasi-diffusive regime as a cancella- 
tion of the effect of traps and fong-range jumps. 

range jumping process quasi-diffusive? We distinguish 
two types of correlations, which we refer to as tempo- 
ral and spatial. On a static DNA, a protein can use the 
same links multiple times, leading to temporal correla- 
tions. Additionally, the positions of different links are 
spatially correlated, since an existing link strongly en- 
hances the probability to find another link nearby (e.g. 
a loop in the DNA favors further contacts within the 
loop) . To separate the effect of temporal and spatial cor- 
relations, we destroy the latter by choosing a new ran- 
dom starting point for each link while conserving its arc 
length \s — s'\. The protein transport on such reshuf- 
fled link sets is super-diffusive as revealed by simulations 
shown in Fig. S4. Hence temporal correlations alone are 
not sufficient to cause the quasi-diffusive behavior. A 
simple argument makes this plausible: If the region vis- 
ited by the protein grows super-diffusively as A(t) ~ t 2 , 
the protein visits only a fraction ~ 1/t of the sites within 
A. Since it sees each site 0(1) times, it mostly uses novel 
links and the persistence of links is unimportant. 

Islands. — A striking consequence of the spatial corre- 
lations is revealed in Fig. [3^A, where all links in a typical 
DNA configuration are depicted as arcs. The arcs clus- 
ter into "islands" with many internal links but no links 
between islands. These islands disappear when the same 
links are randomly placed on the DNA, see Fig. [3f3. In- 
tuitively, it is clear that the existence of islands slows 
the exploration of the DNA, since the protein can move 
from one island to another only by sliding. In fact, if 
the islands had a well-defined typical size s, the protein 
dynamics would be diffusive on long scales s ^> s. How- 
ever, the problem is more intricate, since the distribution 
of island sizes has the same heavy tail p(s) ~ s~ 3 / 2 as 
the link length distribution, see Fig. S5. Nevertheless, 
the existence of islands is a crucial clue; we show below 
that it leads to a dynamics that can be described by a ID 
transport model with traps and long-distance jumps. To 
this end, we first note two essential transport properties 
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of islands: (i) Due to the internal links, the position of a 
protein is rapidly randomized within an island, such that 
for most starting positions within an island, it leaves the 
island with nearly the same probability to each side, see 
Fig. S6. (ii) The typical trapping time within an island 
scales as r ~ s 3 ^ 2 with the island size, see Fig. S7. 

Given these properties, we consider protein transport 
on an array of islands with sizes Sj drawn from the dis- 
tribution p(s). Each island has an associated trapping 
time Tj(sj). It will be instructive to allow for adjustable 
exponents /i and k in the scaling behavior, p(s) ~ s~ 
and t ~ s K . Combining these relations, we obtain a 
distribution of trapping times w(t) ~ t~ 1_ ' 1 / k ) since 
w(r)dr = p(s)ds. The transport behavior of the pro- 
tein in island space is then determined by the ratio of 
the exponents: Using the first passage time calculus [2~4"j . 
the typical time needed to move over n islands is 



for 
for 



K > (1 
K < [I 



(1) 



with the sum dominated by the largest term for the case 
k > fi while a typical trapping time exists for k < ll. To 
map the dynamics in island space back onto the DNA, 
note that the total DNA length S of n islands scales as 



S(n) 



for 
for 



ll < 1 
fj, > 1 



(2) 



as S is dominated by the largest island for ll < 1. Com- 
bining ([I]) and ^ yields the transport behavior along 
the DNA, i.e. the typical time to travel a given distance. 
Fig. [3p shows the phase diagram spanned by the expo- 
nents fi and k. It exhibits four different regimes. For 
ii > 1, the distribution of island sizes has a well defined 
mean and no super-diffusion can occur, but sub-diffusive 
dynamics results when the trapping time distribution has 
a sufficiently heavy tail (/i < k). If fx < 1, the dynam- 
ics is super-diffusive unless long trapping times in islands 
compensate for long jumps. In particular, t ~ S v+ K for 
fi < k, which includes the case of interest here, where the 
two exponents precisely add up to 2, rationalizing quasi- 
diffusion in the quenched limit. Within our more gen- 
eral island model, a whole line of points exists where the 
dynamics is quasi-diffusive. In contrast, for the protein 
transport on the DNA contour, ll and k are not indepen- 
dent, since they are both related to the statistics of the 
network topologies created by the DNA conformations. 
Why this leads to ix + k — 2 remains to be understood. 

Conclusion. — We analyzed the transport and search 
of proteins on a dynamic DNA contour. We showed that 
the highly correlated nature of the protein dynamics per- 
sists over a broad range of our dimensionless dynamic 
parameter k — k^> /k p and significantly slows down the 
search process. Our findings imply that under the in 
vitro conditions of our model, protein jumping is effec- 
tive as a mechanism to destroy the redundancy of a dif- 
fusive ID search only if the DNA dynamics is sufficiently 



fast compared to the timescale between protein jumps 
or if many proteins search in parallel. Of course, the in 
vivo situation is complicated by many additional factors, 
such as the non-random conformation and the confine- 
ment of the DNA. We also found that the "paradoxical" 
quasi-diffusive dynamics in the quenched limit |21) can 
be viewed as a subtle cancellation of the effect of traps 
and long-distance jumps. The interplay between traps, 
jumps, and memory in ID transport is an intricate prob- 
lem in statistical mechanics [26 . The protein-DNA sys- 
tem naturally displays a nontrivial interplay and surpris- 
ingly is tuned to a critical point in our dynamical phase 
diagram. 
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FIG. SI: Illustration of the move set of our kinetic Monte Carlo scheme. The DNA chain is 
represented by a path on a cubic lattice (left). The protein is represented by a point particle which 
moves at rate k p , either by randomly sliding along the chain contour or by jumping to another 
segment of the chain at the same position. The link diagram representation (right) has the DNA 
contour stretched out to a line and indicates possible jumps by arcs. Links can be created or 
destroyed by the Rouse dynamics of the DNA, which is implemented with a generalized Verdier- 
Stockmayer move set allowing for kink flips, turns at the chain end, and crankshaft moves. Each 
move is carried out at the rate &d- 
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FIG. S2: Simulation results for the target search on the dynamic DNA chain (of length L = 5000). 
The protein was initially placed in the center of the chain and the time was measured until it 
first arrived at a target site placed a distance s away. These simulations were performed 1000 
times, using different initial polymer configurations, to determine the median of the search time, 
which represents a typical search time. Both s and the kinetic ratio k = kj^/kp of DNA to protein 
moves were varied. The simulation of the target search on the dynamic chain is computationally 
expensive, which limits our range of k values. (A) The typical search time to a target site at 
distance s = 2000 as a function of the kinetic ratio k. A substantial increase of the search time 
with slowing DNA kinetics is apparent. (B) The typical search time plotted against the distance of 
the target site, for the different kinetic ratios. The dependence of the search time on the distance 
s becomes weaker as A; is increased (however, a significant dependence on s remains for all our k 
values) . 
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t/k 8 

FIG. S3: Data collapse to extract an estimate of the scaling behavior of the crossover between super- 
diffusive and quasi-diffusive dynamics. The three curves from Fig. 2 (main text) with different finite 
kinetic ratios k can be collapsed onto each other by rescaling of the axes (here we have used the 
asymptotic value of 2 for the exponent a). The best collapse is obtained when the time is rescaled 
as t/k 6 with 5 around 0.25. This exponent deviates from the theoretically expected value of 1/3, 
however the deviation is not significant given the finite size effect of our simulations. 
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FIG. S4: Kinetic Monte Carlo simulation of the protein dynamics on reshuffled link sets. The data 
is obtained from simulations using the actual link set of a random DNA configuration and then 
randomly reshuffling the positions of these links (while conserving the length of each link). The 
protein dynamics is simulated on this reshuffled but temporally fixed link set. Finally the average 
dynamics of the width A(t) is obtained by averaging over many initial DNA configurations (each 
randomly drawn). The dynamics of A(i) is super-diffusive, showing that the temporal correlations 
are not sufficient to produce the quasi-diffusive behavior (see main text). 
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FIG. S5: Distribution of islands sizes. The distribution was obtained by generating random DNA 
conformations of length L and picking a single island from each link diagram at random. The 
distribution displays the same power law decay as the distribution of link lenghts. 
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FIG. S6: Exit probability from an island to the right as a function of the starting position xq 
(normalized here by the island size s). The exit probability is shown both for a single randomly 
chosen island (blue downward triangles) with the link configuration indicated by the link diagram in 
the bottom, as well as averaged over an ensemble of 1000 islands of the same size (black triangles). 
For comparison, the case of pure diffusion (no links), where the exit probability depends linearly 
on the starting position, is also shown (the solid line shows the analytical dependence while the 
circles indicate simulation data, which was obtained as a control using the same simulation code as 
for the islands) . It is evident that the probability of exiting an island on a given side depends only 
weakly on the initial position, at least in the core of the island. This justifies our coarse grained 
hopping model in island space. 
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FIG. S7: Average trapping time of a protein within an island as a function of island size s. Each 
data point is obtained as an average over many simulations where a protein is initialized in the 
center of an island of size s within a randomly drawn DNA configuration, and the time until it 
exits from the island is recorded. This island-size dependent characteristic trapping time scales as 
r( S ) ~ s 3 / 2 . 
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